Skip to content

Conversation

@bulleting0724
Copy link

@bulleting0724 bulleting0724 commented Jan 1, 2025

Description

This PR solved an issue when cs-agent throws an exception during ssl handshake; the TCP connection is not closed between cs-server and cs-agent, which further causes the server thread to hang forever.

When the ssl handshake is at the client key exchange phrase, the server will be waiting for the agent to provide cipher suit, but at the agent side the exception could happen when the agent can’t conform to the cipher suite that the server provides, so the agent couldn’t communicate client key to the server. Thus at the server side the handshake thread is forever pending on a function that expects to read packets from SocketChannel.

Steps to reproduce this issue
1.server uses a 1024 bit rsa public key which you can verify by typing “keytool -list -storepass $keystore_password -keystore $keystore_file -v”.
2.find “Subject Public Key Algorithm” in the step 1 output.
2.at the agent, edit “JAVA_HOME/jre/lib/security/java.security” and append “RSA keySize < 2048” to jdk.tls.disabledAlgorithms.
3.restart cloudstack-agent.

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)
  • build/CI
  • test (unit or integration test code)

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

  • Major
  • Minor

Bug Severity

  • BLOCKER
  • Critical
  • Major
  • Minor
  • Trivial

How Has This Been Tested?

Before I applied this change, in that situation and at the agent side, the state of the tcp connection was CLOSE_WAIT and remained there forever. When I applied this change, the agent actively closed the channel which in turn actively closed the tcp connection and the state of the tcp connection will move to TIME_WAIT which is a normal state indicating the connection is closing.

@boring-cyborg
Copy link

boring-cyborg bot commented Jan 1, 2025

Congratulations on your first Pull Request and welcome to the Apache CloudStack community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/cloudstack/blob/main/CONTRIBUTING.md)
Here are some useful points:

@codecov
Copy link

codecov bot commented Jan 1, 2025

Codecov Report

❌ Patch coverage is 0% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 16.07%. Comparing base (fd24509) to head (3f3f7d0).
⚠️ Report is 807 commits behind head on main.

Files with missing lines Patch % Lines
...s/src/main/java/com/cloud/utils/nio/NioClient.java 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff            @@
##               main   #10153   +/-   ##
=========================================
  Coverage     16.07%   16.07%           
- Complexity    12885    12886    +1     
=========================================
  Files          5642     5642           
  Lines        494039   494040    +1     
  Branches      59912    59912           
=========================================
+ Hits          79408    79414    +6     
+ Misses       405828   405822    -6     
- Partials       8803     8804    +1     
Flag Coverage Δ
uitests 4.01% <ø> (ø)
unittests 16.92% <0.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Contributor

@shwstppr shwstppr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// remaining task done
task = _factory.create(Task.Type.CONNECT, link, null);
} catch (final GeneralSecurityException e) {
_selector.close();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here as well?

@DaanHoogland
Copy link
Contributor

thanks for your contribution @bulleting0724 ,

can you have a quick look at the code @shwstppr is pointing at. He uses an extra null-check and isOpen() to make sure the close() is valid and needed. Alternatively, we can wait for his bigger change if you agree that it solves your problem as well.

@github-actions
Copy link

github-actions bot commented Feb 1, 2025

This pull request has merge conflicts. Dear author, please fix the conflicts and sync your branch with the base branch.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR addresses an SSL handshake issue whereby the TCP connection remained open, causing a server thread to hang during a failed handshake.

  • Actively closes the client connection when an IOException occurs during handshake initialization
  • Ensures the selector is closed following the connection close

_selector.close();
throw new IOException("Failed to initialise security", e);
} catch (final IOException e) {
_clientConnection.close();
Copy link

Copilot AI Jun 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider wrapping _clientConnection.close() in a try-catch block to ensure that any exception thrown during the close operation does not obscure the original IOException and to better manage resource cleanup.

Suggested change
_clientConnection.close();
try {
_clientConnection.close();
} catch (final IOException closeException) {
logger.error("Failed to close _clientConnection", closeException);
}

Copilot uses AI. Check for mistakes.
@DaanHoogland
Copy link
Contributor

Had to do something similar here, https://github.com/apache/cloudstack/pull/9840/files#diff-a96656b544a2f0b6dd8c2ed1b2232a1b8023a22025e0bf47232fe81b4be64b96

as that is merged @shwstppr , is this PR still valid?

@shwstppr
Copy link
Contributor

@DaanHoogland cc @bulleting0724 I think this was covered in #9840 and is present in main branch now

@DaanHoogland
Copy link
Contributor

closing @bulleting0724 , please reopen or create a new PR if you feel there is still an issue.

@github-project-automation github-project-automation bot moved this from In Progress to Done in Apache CloudStack 4.22.0 Oct 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

No open projects
Status: Done

Development

Successfully merging this pull request may close these issues.

5 participants